Skip to main content

Linear Models

ModelMathematical ExpressionDescription
Ordinary Least SquaresminwXwy22\min_{w} \|Xw - y\|_2^2Fits a linear model with coefficients to minimize the residual sum of squares between observed and predicted targets. Sensitive to outliers; not robust if features are correlated (multicollinearity).
Ridge RegressionminwXwy22+αw22\min_{w} \|Xw - y\|_2^2 + \alpha \|w\|_2^2Adds L2 regularization to the model to address some of the problems of Ordinary Least Squares. More robust to multicollinearity; has a bias-variance trade-off controlled by α\alpha.
Lasso Regressionminw12nsamplesXwy22+αw1\min_{w} \frac{1}{2n_{\text{samples}}} \|Xw - y\|_2^2 + \alpha \|w\|_1Adds L1 regularization to enforce sparsity of the coefficient vector. Useful for feature selection; produces models with fewer coefficients.
Elastic Netminw12nsamplesXwy22+αρw1+α(1ρ)2w22\min_{w} \frac{1}{2n_{\text{samples}}} \|Xw - y\|_2^2 + \alpha \rho \|w\|_1 + \frac{\alpha (1-\rho)}{2} \|w\|_2^2Combines L1 and L2 regularization to control the complexity of the model with two parameters. Balances between Ridge and Lasso; useful when there are correlations among features.
Logistic Regressionminw,ci=1nlog(1+exp(yi(XiTw+c)))\min_{w, c} \sum_{i=1}^{n} \log (1 + \exp (-y_i (X_i^T w + c)))Used for binary classification problems, estimates probabilities using a logistic function. Provides probabilistic interpretation for binary classification tasks.
Polynomial RegressionDepends on the degree of the polynomial features created from XX.Extends linear models by adding polynomial terms, which allows fitting a broader range of data. Can fit non-linear patterns; beware of overfitting with high-degree polynomials.
RidgeCVSame as Ridge, with α\alpha optimized by CV.Ridge regression with built-in cross-validation of the alpha parameter to determine the best regularization. Convenient for automating the choice of α\alpha.
LassoCVSame as Lasso, with α\alpha optimized by CV.Lasso regression with built-in cross-validation for selecting the best value of α\alpha. Efficient for high-dimensional data; automates α\alpha selection.